Sparse optimization on measures with over-parameterized gradient descent
نویسندگان
چکیده
Minimizing a convex function of measure with sparsity-inducing penalty is typical problem arising, e.g., in sparse spikes deconvolution or two-layer neural networks training. We show that this can be solved by discretizing the and running non-convex gradient descent on positions weights particles. For measures d-dimensional manifold under some non-degeneracy assumptions, leads to global optimization algorithm complexity scaling as $$\log (1/\epsilon )$$ desired accuracy $$\epsilon $$ , instead ^{-d}$$ for methods. The key theoretical tools are local convergence analysis Wasserstein space an perturbed mirror measures. Our bounds involve quantities exponential d which unavoidable our assumptions.
منابع مشابه
Sparse Communication for Distributed Gradient Descent
We make distributed stochastic gradient descent faster by exchanging sparse updates instead of dense updates. Gradient updates are positively skewed as most updates are near zero, so we map the 99% smallest updates (by absolute value) to zero then exchange sparse matrices. This method can be combined with quantization to further improve the compression. We explore different configurations and a...
متن کاملDictionary Learning with Large Step Gradient Descent for Sparse Representations
This work presents a new algorithm for dictionary learning. Existing algorithms such as MOD and K-SVD often fail to find the best dictionary because they get trapped in a local minimum. Olshausen and Field’s Sparsenet algorithm relies on a fixed step projected gradient descent. With the right step, it can avoid local minima and converge towards the global minimum. The problem then becomes to fi...
متن کاملMutiple-gradient Descent Algorithm for Multiobjective Optimization
The steepest-descent method is a well-known and effective single-objective descent algorithm when the gradient of the objective function is known. Here, we propose a particular generalization of this method to multi-objective optimization by considering the concurrent minimization of n smooth criteria {J i } (i = 1,. .. , n). The novel algorithm is based on the following observation: consider a...
متن کاملSimple2Complex: Global Optimization by Gradient Descent
A method named simple2complex for modeling and training deep neural networks is proposed. Simple2complex train deep neural networks by smoothly adding more and more layers to the shallow networks, as the learning procedure going on, the network is just like growing. Compared with learning by end2end, simple2complex is with less possibility trapping into local minimal, namely, owning ability for...
متن کاملConvex Optimization : A Gradient Descent Approach
This algorithm is introduced by [Zin]. Assuming {lt(·)}t=1 are differentiable: Starting with some arbitrary x1 ∈ X. For t = 1, ...,T • zt+1← xt − η 5 lt(xt). • xt+1←ΠX(zt+1), where ΠX(·) is the projection function. The performance of OGD is described as follows. Theorem .. If there exists some positive constant G,D such that || 5 lt(x)||2 ≤ G,∀t,x ∈ X (.) ||X ||2;= max x,y∈X ||x − y||2 ≤D...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Mathematical Programming
سال: 2021
ISSN: ['0025-5610', '1436-4646']
DOI: https://doi.org/10.1007/s10107-021-01636-z